Survey on MapReduce Scheduling Algorithms

نویسندگان

Liya Thomas

Quan Chen

Daqiang Zhang

Minyi Guo

Qianni Deng

Song Guo

Xiaoyu Sun

Chen He

Ying Lu

R. Nanduri

N. Maheshwari

A. Reddyraja

چکیده

MapReduce is a programming model used by Google to process large amount of data in a distributed computing environment. It is usually used to perform distributed computing on clusters of computers. Computational processing of data stored on either a file system or a database usually occurs. MapReduce takes the advantage of locality of data, processing data on or near the storage areas, thereby avoiding unnecessary data transmission. The simplicity of the programming model and the automatic handling of node failures hiding the complexity of fault tolerance make MapReduce to be used for both commercial and scientific applications. As MapReduce clusters have become popular these days, their scheduling is one of the important factor which is to be considered. In order to achieve good performance a MapReduce scheduler must avoid unnecessary data transmission. Hence different scheduling algorithms for MapReduce are necessary to provide good performance. This paper provides an overview of four different scheduling algorithms for MapReduce namely; Scheduling algorithm in Hadoop, Longest Approximate Time to End (LATE) MapReduce scheduling algorithm, Self-Adaptive MapReduce(SAMR) scheduling algorithm and Enhanced Self-Adaptive MapReduce scheduling algorithm(ESAMR). An overview of these techniques is provided through this paper. Advantages and disadvantages of these algorithms are identified.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scheduling and Energy Efficiency Improvement Techniques for Hadoop Map-reduce: State of Art and Directions for Future Research

MapReduce has become ubiquitous for processing large data volume jobs. As the number and variety of jobs to be executed across heterogeneous clusters are increasing, so is the complexity of scheduling them efficiently to meet required objectives of performance. This report presents a survey of some of the MapReduce scheduling algorithms proposed for such complex scenarios. A taxonomy is provide...

متن کامل

Survey on MapReduce and Scheduling Algorithms in Hadoop

We are living in the data world. It is not easy to measure the total volume of data stored electronically. They are in the unit of zettabytes or exabytes referred as Big Data. It can be unstructured, structured or semi structured, they are not convenient to store as well as process with normal data management methods and with machine having limited computational power. Hadoop system is used to ...

متن کامل

MapReduce Scheduler: A 360-degree view

Undoubtedly, the MapReduce is the most powerful programming paradigm in distributed computing. The enhancement of the MapReduce is essential and it can lead the computing faster. Therefore, there are many scheduling algorithms to discuss based on their characteristics. Moreover, there are many shortcoming to discover in this field. In this article, we present the state-of-the-art scheduling alg...

متن کامل

An Investigation on Scheduling Policies for Cloud-based Software Systems

Background: The rapid diffusion of cloud computing technology has been a focus of interest for enterprises due to its higher scalability and availability and greater elasticity. Nevertheless the limited scheduling mechanisms for running applications in the cloud have been a major challenge. Aim: This project introduces an effective scheduling algorithm, which attempts to maximize cloud resource...

متن کامل

Evaluating map reduce tasks scheduling algorithms over cloud computing infrastructure

Efficiently scheduling MapReduce tasks is considered as one of the major challenges that face MapReduce frameworks. Many algorithms were introduced to tackle this issue. Most of these algorithms are focusing on the data locality property for tasks scheduling. The data locality may cause less physical resources utilization in non-virtualized clusters and more power consumption. Virtualized clust...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Survey on MapReduce Scheduling Algorithms

نویسندگان

چکیده

منابع مشابه

Scheduling and Energy Efficiency Improvement Techniques for Hadoop Map-reduce: State of Art and Directions for Future Research

Survey on MapReduce and Scheduling Algorithms in Hadoop

MapReduce Scheduler: A 360-degree view

An Investigation on Scheduling Policies for Cloud-based Software Systems

Evaluating map reduce tasks scheduling algorithms over cloud computing infrastructure

عنوان ژورنال:

اشتراک گذاری